Challenges and prospects of visual contactless physiological monitoring in clinical study

The monitoring of physiological parameters is a crucial topic in promoting human health and an indispensable approach for assessing physiological status and diagnosing diseases. Particularly, it holds significant value for patients who require long-term monitoring or with underlying cardiovascular disease. To this end, Visual Contactless Physiological Monitoring (VCPM) is capable of using videos recorded by a consumer camera to monitor blood volume pulse (BVP) signal, heart rate (HR), respiratory rate (RR), oxygen saturation (SpO2) and blood pressure (BP). Recently, deep learning-based pipelines have attracted numerous scholars and achieved unprecedented development. Although VCPM is still an emerging digital medical technology and presents many challenges and opportunities, it has the potential to revolutionize clinical medicine, digital health, telemedicine as well as other areas. The VCPM technology presents a viable solution that can be integrated into these systems for measuring vital parameters during video consultation, owing to its merits of contactless measurement, cost-effectiveness, user-friendly passive monitoring and the sole requirement of an off-the-shelf camera. In fact, the studies of VCPM technologies have been rocketing recently, particularly AI-based approaches, but few are employed in clinical settings. Here we provide a comprehensive overview of the applications, challenges, and prospects of VCPM from the perspective of clinical settings and AI technologies for the first time. The thorough exploration and analysis of clinical scenarios will provide profound guidance for the research and development of VCPM technologies in clinical settings.


INTRODUCTION
Visual Contactless Physiological Monitoring (VCPM) is an emerging technology that can measure the vital signs based on videos.It has been proven that VCPM is highly effective in monitoring blood volume pulse (BVP) signal, heart rate (HR), respiratory rate (RR), oxygen saturation (SpO 2 ), and blood pressure (BP) [1][2][3][4] .More significantly, VCPM's contactless characteristic offers clinical benefits such as user-friendliness, full automation, long-term monitoring, zero skin damage, improved clinical workflow efficiency and the greatly reduced risk of cross-infection.Particularly, VCPM can also play a critical role in combating cardiovascular diseases (CVDs) and offer full-cycle personal health management 5,6 .
As illustrated in Fig. 1, the basic physiological principles of VCPM are established on the cardiopulmonary and circulatory systems.As shown in Fig. 1a, the cardiopulmonary system facilitates the transportation of blood between the heart and lungs, whereas the blood moves from the aorta through the systemic arteries.In blood circulation theory, blood is ejected out of the heart and propagates along the arterial tree, and the BVP waveform takes on typical morphological components corresponding to landmark events (e.g., the contraction of left ventricle and the dicrotic notch) in the cardiac cycle 7 .Since blood flow is regulated by cardiac and respiratory interactions, it is theoretically possible to extract various physiological parameters through the analysis of a photoplethysmography (PPG) signal 7 .
Figure 1 illustrates remote PPG (rPPG) technology, approaches of physiological parameter measurement, and the solution of cardiopulmonary status assessment, which is broadly defined as VCPM technologies in this paper.As shown in Fig. 1b, owing to the fact that BVP waveform can be detected by the camera, rPPG technology is capable of extracting PPG signals from videos of the skin.Furthermore, PPG signals are employed to infer HR, RR, SpO 2 and BP (Fig. 1c-e).SpO 2 monitoring requires the measurement of at least two PPG wavelengths for SpO 2 calibration.Additionally, BP can be measured by multi-site pulse transit time (PTT) (inferred from PPG waveforms from two distinct body sites), multiwavelength PTT (from different skin layers), and morphological features.Moreover, in Fig. 1f, the vital signs can be employed to assess the wellness of cardiopulmonary system.
In 2000, Wu et al. proposed the first prototype of PPG imaging that uses an NIR light and black/white camera 8 .In 2007, researchers discovered that consumer RGB cameras can detect PPG waveforms in ambient light. 9,10.After a decade, various camera-based rPPG algorithms, which were developed based on conventional computer vision and signal processing technology, have made a vigorous progress.Classic and popular algorithms include but are not limited to: CHROM 11 , PBV 12 , POS 13 , S 2 R 14 .In 2017, which can be called "the first year of deep learning for rPPG technology", scholars from University of Oxford and Taipei University of Science and Technology respectively presented their research achievements on newborn and adult subjects at the International Conference on Automatic Face and Gesture Recognition and the International Joint Conference on Biometric Recognition 15,16 .From 2021 to 2023, a variety of VCPM algorithms have emerged for the continuous monitoring of premature infants, babies, ICU patients, elderly people, etc [17][18][19][20] .Meanwhile, the number of the studies based on AI technologies with healthy subjects/laboratory environments has increased exponentially.Furthermore, PPG waveforms, derived from videos [20][21][22] , can be employed to infer HR 23,24 , SpO 2 25,26 , RR 18,27 , BP 28,29 and disease analysis [30][31][32] .
The COVID-19 pandemic over the past three years (2020-2022) has expedited the revolution of digital medicine [33][34][35][36][37] .The utilization of telemedicine systems experienced an exponential growth in numerous countries in the Organization for Economic Co-operation and Development (OECD) throughout 2020 38 .Compared to 2019, the number of Medicare fee-for-service beneficiary telehealth visits increased 63-fold in 2020, reaching nearly 52.7 million in the United States 39 .Similarly, in Germany, there were almost 1.4 million video consultations conducted during the first half of 2020 40 .In the second quarter of 2020 alone, patients consulted with doctors or psychotherapists via video almost 1.2 million times 40 .
Most importantly, COVID-19 has changed the context of digital medicine, and promoted the development of telemedicine and Primary Health Care (PHC) system 34,35,37,38,41 .Governments paid increasing attention to digital medicine and telemedicine systems, and patients gradually accepted this treatment approach 37,38,42 .During the COVID-19 outbreak, we have noticed that many countries and regions were suffering from a shortage of essential vital-sign monitoring equipment, particularly blood oxygen level monitors.Given that blood saturation is a critical biomarker that can be utilized to infer the likelihood of being infected with the COVID-19 disease, patients under home quarantine can judge whether they are developing lung infections or experiencing severe illness by these parameters 35,[43][44][45] .Because of the utilization of off-the-shelf devices such as webcams or smartphone cameras for measuring vital signals, VCPM technologies can potentially solve the aforementioned challenges of medical equipment shortages.Based on these factors, VCPM technologies offer a natural and costeffective approach to establishing digital medicine or PHC systems.
VCPM technologies based on deep learning have made tremendous progress in recent years, but the majority of these studies are limited to laboratory settings or healthy subjects.To apply these technologies to clinical medicine, there is a large space of improvement.Therefore, the motivation of this review paper is to re-examine the application of VCPM technologies in clinical healthcare monitoring, summarize the encountered challenges and issues, and enhance the fundamental theory of VCPM in clinical settings.Moreover, the prospect of developing a VCPM algorithm based on the state-of-the-art (SOTA) artificial intelligence technologies is depicted.Overall, this review offers the guidelines for the future development of VCPM algorithms toward clinical-grade applications.
The rest of the paper is structured as four sections.First, the search results of existing relevant works and study characteristics will be elaborated.Then, we will discuss the revolution of digital medicine, the merits of VCPM technology and the necessity of clinical settings in general.Next, the main challenges in clinical study will be illustrated in detail.Finally, the future directions and prospects of VCPM will be presented at length.

RESULTS
In this section, the future directions of VCPM will be summarized from three perspectives: the adoption of SOTA deep learning technologies and the breakthrough in the current limitations and clinical application challenges.The framework of this section is organized as Fig. 2. The establishment of a national and international standard of VCPM system is of top importance, and other parts can be divided into AI technology, clinical application and other aspects.

Unsupervised learning
The unsupervised learning technique can be employed to establish AI models for the VCPM task without relying on ground truth vital signs during the training stage.Furthermore, unsupervised learning methods are typically more robust to noise and variations in data, making them ideal for real-world applications.In fact, not only the vital signs hidden in the skin video are weak, but the spatio-temporal features are intertwined 46 .Hence, it is a significant challenge to explicitly design neural network structures or loss functions to effectively decouple these spatio-temporal vital-sign features.Nevertheless, it is feasible to construct a reasonable strategy of unsupervised learning that enables the model to learn on its own and disentangle the intertwined features.For instance, recent research has demonstrated that unsupervised learning technologies are capable of extracting rPPG signals from unlabeled video data [47][48][49][50][51][52][53] .Moreover, the performance of those algorithms [48][49][50][51]54 is comparable to or even better than that of supervised approaches.

Federated learning
Federated Learning (FL) is a distributed machine learning paradigm or framework, and it is proposed to solve the data island problem of privacy protection.FL is capable of joint modeling without sharing participants' data.The training dataset is stored in the local storage of participants, ensuring user privacy and complying with data usage standards 55,56 .In addition, FL technologies have arguably become the most widely used privacy preservation technique in AI-based medical applications [56][57][58][59][60][61][62] .Overall, FL technology is a promising solution to privacy protection, which can promote the R&D of VCPM for multicentric clinical application studies.For instance, Liu et al. firstly developed a mobile FL camera-based PPG signal monitoring system with non-clinical public databases and showed that it can perform competitively with traditional state-of-the-art supervised learning methods 63 .

Skin segmentation and temporal consistency
Due to the common occurrence of face occlusion and lateral face orientation in clinical settings, skin segmentation is a suitable solution that can effectively alleviate these unfavorable conditions.ROI extraction or skin segmentation is a critical preprocessing step for the VCPM task as only the skin surface can offer information of blood volume changes.In the earlier studies, various facial landmark detectors [64][65][66] have been utilized to locate the ROI 67 .Nevertheless, it proves that these methods are ineffective in scenarios involving head movement or face occlusion, etc 17,68,69 .Additionally, Ouzar et al. demonstrated that face detectors [64][65][66] might fail to detect ROI in the MMSE-HR dataset 70 , whereas this issue can be resolved by adopting a face segmentation algorithm 69,71 .
It is important to accurately segment the skin ROI to extract vital signs effectively.Furthermore, skin segmentation has the ability to reduce noise and variation in original data, thereby improving the accuracy of VCPM algorithms and enabling vital signs extraction even under non-ideal conditions.In particular, skin segmentation is greatly crucial in clinical settings where the occlusion of the face of ICU patients is more prevalent.As shown in Fig. 3a Fig. 2 The pipeline of the future direction topics.level, effectively mitigating interference from non-skin regions and enhancing the accuracy and robustness of the VCPM algorithm.The skin segmentation results, generated by the segment anything model (SAM) online demo (https://segmentanything.com/demo), on simulated clinical dataset are presented in Fig. 3.
Despite the introduction of segmentation algorithms by current pioneering researchers, the temporal consistency of continuous frames across video has not been taken into consideration 22,69 .Unlike single-image segmentation, temporal consistency is a critical metric that can significantly improve the performance of VCPM algorithms.Temporal consistency guarantees that the segmentation of each frame remains consistent with that of previous frames, which is crucial for accurate tracking of skin regions in videos over time.If the temporal consistency of the skin segmentation approach is suboptimal, the segmentation algorithm may introduce extra noise, ultimately leading to a decline in the performance of the VCPM algorithm.The AI-based optical flow is a potential research direction that can improve temporal consistency of video skin segmentation.
In 2023, various revolutionary segmentation tools 72,73 were published, which provided great prospects for further improvement of VCPM algorithm performance.In Fig. 3, the results of facial skin segmentation using SAM 72 in ICU patients under various complicated clinical conditions are presented.The segmentation results demonstrate that the background area is completely eliminated and the skin area is well preserved at a pixel-level precision.Thus, with the aid of advanced segmentation tools, the clinical VCPM algorithm can be trained with less background noise and more effective data, thereby increasing the feasibility of practical application.
Establishment of the national and international standard of the VCPM system VCPM technology is an accessible, comfortable, and convenient approach for physiological monitoring.To prevent the potential abuse or misuse of VCPM, it is essential to establish national and international standards and guidelines for its use in digital medicine.In terms of algorithm performance and data security, the standards should at least encompass the following aspects.Additionally, the recommended settings for the clinical application of VCPM technologies are listed in Table 1.
• Video capture software and hardware settings.The coded format of recording video is a crucial parameter for VCPM algorithm to extract vital signs.If the compression ratio of the collected video is too high, it may result in the loss of weak physiological signals implied in the video.The vital signs are time-domain information, therefore a stable and consistent sampling rate of the video is required.In addition, the resolution of the video is a crucial factor that ensures video quality and minimizes white noise.Hence, it is imperative to develop specialized video recording software to configure camera settings that can ensure the optimal performance of the VCPM algorithm.
• Standard operating procedure (SOP).The SOP includes the lowest ambient light intensity, allowable subject motion magnitude, the shortest video duration and other considerations.
• Data privacy protection.Due to the fact that video data commonly cover both facial information and vital signs of subjects, protecting privacy is a crucial issue and a primary requirement.Formulate the criteria for video data accessibility based on the purposes and occupational categories.The related occupations include public individuals, physicians, researchers, pertinent government staffs and policy makers.

Disease analysis, diagnosis and cardiopulmonary status assessment
With the aid of the AI technology boom, PPG, HR and heart rate variability can serve as biomarkers for disease analysis, diagnosis, and assessment of cardiopulmonary status [30][31][32][74][75][76] . Recenty, numerous studies have demonstrated the high sensitivity and specificity of VCPM technologies in detecting atrial fibrillation 75,[77][78][79][80][81] .Additionally, in literature 32 , a novel AI algorithm, which leverages PPG and ECG generated by PPG, has been successfully developed for CVD detection, including coronary artery disease, congestive heart failure, myocardial infarction (MI), and hypotension (HOTN).It can be seen that the utilization of VCPM technologies for monitoring vital signs and capturing their variations over days and weeks holds great potential in enabling early disease prediction and diagnosis 35,82 .

Multiple vital-sign measurement
Despite the verification of VCPM technology in measuring HR, RR, SpO 2 , and BP, a multi-task AI model proficient in simultaneous detection of the four vital signs has not emerged.However, in clinical settings, it is imperative to concurrently monitor multiple vital signs to ensure comprehensive monitoring of the patient's physiological status.In clinical patient monitoring, HR, RR, SpO 2 and BP are the four essential parameters that comprehensively reflect the cardiopulmonary status of patients, and they are the fundamental indicators of the traditional multi-parameter patient monitors.If the VCPM framework can monitor multiple vital parameters simultaneously, it will be closer to the application of fully non-contact monitoring of patients in highly acute settings.Therefore, the study of multi-parametric measurement of AI models will make a crucial breakthrough in real-world clinical applications.Currently, the majority of researchers are primarily focused on developing contactless measurement algorithms for a single physiological parameter or two parameters with strong correlation, such as HR and RR.For instance, Villarroel and Jorge et al. have developed two AI models capable of monitoring HR and RR in clinical conditions 18,68 .

Establishment of public health early warning and decision system based on VCPM
As illustrated in Fig. 4, the VCPM-based telemedicine system will not only be used for personal health monitoring and disease diagnosis, but also serves as an AI tool in response to public health issues, such as CVD in the elderly and the COVID-19 pandemic.Firstly, The vital signs of individuals measured by the VCPM-based telemedicine system can be utilized for personalized healthcare and disease diagnosis.Moreover, during an epidemic, the largescale basic data of the public collected by the VCPM telemedicine system can be employed to establish a public health decisionmaking system, and offer crucial technical support for the government in formulating timely response strategies.For instance, numerous AI prediction models have been developed to predict the infected population and the mortality [100][101][102] .

Integration into telemedicine system for clinical application
In telemedicine or telehealth system, video consultation is one of the must-have functions, which is a subjective approach of disease counseling.After the COVID-19 pandemic from 2020 to 2022, many telemedicine/telehealth systems have implemented this method to mitigate cross-infection risks between healthcare providers and patients during treatment for COVID-19 or other illnesses [103][104][105][106] .Therefore, VCPM can be easily integrated into those existing medical systems to support objective physiological information during video consultations with physicians.The implementation of this measure will further enhance the functionality of the telemedicine system and provide an exceptional user experience for those utilizing the remote system.

Other recent research directions
Database synthesis method.It is a significantly challenging task to collect a large-scale and multi-centric database representing a range of environments, body movements, illumination conditions and physiological states.However, establishing a simulation video database integrated with physiological signals is a feasible solution for VCPM tasks 84,107,108 .For instance, in 2022, Daniel et al. released a synthetic database, named SCAMPS, which comprised 2,800 videos featuring synchronized cardiac and respiratory signals as well as facial action intensities 109 .Moreover, the synthetic data have the merits of noiselessness and precise synchronization.SCAMPS was successfully utilized to train AI models to develop the VCPM algorithms for healthy subjects 52,110 .Thus, developing a simulation database for clinical settings would be an invaluable future direction as it can mitigate the challenges of collecting extensive clinical data while safeguarding medical data privacy.It should be noted that the mathematical modeling  implemented through video conferencing, photo calling or special telemedicine software.Telemedicine is a component of the broader field of telehealth.
• Remote Patient Monitoring (RPM).RPM, a comprehensive technology solution, involves the utilization of sensors and other devices to remotely gather data on a patient's health status, which can then be transmitted to healthcare providers for analysis and intervention.VCPM can be considered as one of the RPM techniques.This technology is applicable to monitor various conditions such as heart failure, diabetes, COVID-19 35 , and interstitial lung disease 33 .
• Self-monitoring and home-based monitoring.Self-monitoring refers to the utilization of digital tools and devices by patients to track their own health data, such as blood pressure monitors, glucose meters, and fitness trackers.This practice enables patients to proactively manage their health and detect potential diseases at an early stage.Home-based monitoring involves leveraging digital technologies to deliver healthcare services directly to patients in their residences.
The measurement physiological signals is a fundamental procedure for monitoring the body's status, which is widely employed in clinical settings and daily health surveillance.VCPM, as a contactless measurement method, offers the benefits of userfriendly monitoring, passive monitoring and cost-effectiveness etc.
Therefore, it has the significant potential for application in clinical settings or home-based monitoring, and is poised to revolutionize traditional medical devices, telemedicine, intelligent monitoring, and medicine industry.
As illustrated in Fig. 5, the current trend in the development of neonatal physiological signal monitoring instruments is shifting from wired contact to wireless contact monitoring [118][119][120] , and ultimately towards contactless measurement.In 2022, an AI-based contactless physiological monitoring algorithm was developed for post-operative patients in ICU settings.In the study, the VCPM algorithm measured the HR with a mean absolute error (MAE) of 2.5 beats/min in comparison to two reference HR sensors, and measured the RR with a MAE of 2.4 breaths/min against the reference value computed from the chest impedance pneumogram 18 .
(B) The merits of VCPM technologies Firstly, VCPM possesses a greater number of inherent and potential advantages.As illustrated in Fig. 6a, the approach of VCPM presents numerous merits, including contactless and noninvasive monitoring, passive measurement, user-friendliness, comfort and convenience, as well as suitability for long-term monitoring.Then, it leverages the ubiquitous devices and internet infrastructure at hand, including smartphones, webcams, and telecommunications systems.Therefore, the VCPM is a more natural method to establish telemedicine or home-based monitoring systems, and has the potential to yield significant economic and social benefits, including but not limited to preventing cross-infection among individuals/patients, reducing patients' costs 121 , and promoting equitable distribution of medical resources.

The merits of clinical applications
Due to the prevalence of camera devices and the convenient monitoring manner, the VCPM technologies have the potential to flexibly record public large-scale disease data and an individual's physiological information.Meanwhile, big data and AI technologies have played a significant role in studying and recognizing brand-new diseases (e.g., predict infection rate and mortality) by utilizing large-scale vital signs from the public 41 .On one hand, VCPM establishes horizontal relationships between patients and providers, and makes multinational collaborations more feasible 41 .Moreover, as illustrated in Fig. 7, VCPM technologies have broad applications in various clinical scenarios, such as elderly care, newborn monitoring, ICU patient healthcare, rehabilitation training, and so on.
On the other hand, in terms of individuals, VCPM can be easily implemented on a large scale to track longitudinal changes, which are crucial medical indicators of their physiological status.Individuals undergo their own daily, weekly, and seasonal fluctuations in a variety of physiological parameters and activities.The earliest deviations from the norm can be detected only by establishing an individual's baseline when they are healthy 44 .Therefore, owing to its flexible and passive manner, VCPM is significant for monitoring the physiological parameters of patients whether they are at home or in the hospital.For example, patients can transmit their skin video to the AI physiological signal monitoring system, and a physician can work remotely based on the vital signs.These advancements may encourage patients and their families to take greater ownership of their own healthcare.This system has the ability to reduce medical costs, decrease reliance on specialized equipment and physicians, promote equal distribution of medical resources, and improve the quality of healthcare services.

The advantages of AI-based VCPM approaches
Firstly, it is an undisputed trend that utilizes SOTA AI technologies to develop VCPM algorithms.According to incomplete statistics, the vast majority of SOTA VCPM algorithms that were released from 2020 to 2023 are based on AI techniques.What's more, the performance of deep learning methods has far exceeded that of signal processing methods.Secondly, AI technologies can not only be employed to develop the approaches of estimating multiple vital parameters, but also are capable of advancing post-process solutions, such as disease diagnosis with individual longitudinal analysis and other similar patient horizontal comparison 44 .In addition, the AI-based VCPM solution with the power of privacy protection presents a highly appealing option for clinical applications, such as utilizing AI-based approaches for protecting privacy 55,57,58 .Generally, the AI-based VCPM solution holds immense potential and offers significant advantages in the fields of clinical applications and digital medicine.

The opportunities of VCPM-based digital medicine system
The VCPM will significantly expand the application range and scenarios of telemedicine and telehealth systems.A telemedicine/ telehealth system typically encompasses the fundamental capabilities of biosignal measurement and video consultation.As illustrated in Fig. 6a, the VCPM approach utilizes existing infrastructures, such as webcams and the Internet, to establish a telemedicine/telehealth system without specialized medical equipment.Therefore, as shown in Fig. 6b, VCPM is particularly well-suited for establishing a telemedicine system that integrates remote patient monitoring, home-based monitoring, and video consultation simultaneously.It can be regarded as one of the essential underlying technologies for telemedicine systems, especially in supporting remote patient monitoring and homebased healthcare.
Ultimately, VCPM technology offers an unprecedented opportunity to self-health monitoring, PHC 41 and telemedicine system due to the distinctive merits of VCPM, which include contactless operation, user-friendly interface, low cost and non-requirement for medical professionals.VCPM can be applied across the entire spectrum of prevention, diagnosis, and treatment.It is competent method to facilitate self-physiological signal monitoring and health status assessment in the stage of disease prevention.Furthermore, it can be integrated into PHC and telemedicine systems with fundamental physiological data for diagnosis.It serves as a tool to monitor the body's physiological state, and can be applied widely as illustrated in Fig. 7.

(C) The necessity of clinical settings
Firstly, multi-parameter monitoring is a necessary approach to maintaining the life and health of preterm / newborn infants.In 2020, World Health Organization reported that approximately 35% of all under-5 deaths occurred within the first week of birth 122 .For newborn or preterm infants, vital-sign monitoring is a fundamental clinical requirement because the fetal-to-neonatal transition after birth is a complex physiological process that affects all organ systems [123][124][125][126] .Moreover, it is also an indispensable procedure in the neonatal intensive care unit (NICU) environment.However, traditional contact-based methods are uncomfortable even harmful over the long-term contact of sensors.Thus, the visual contactless pipeline provides a notable competitive advantage in vital-sign monitoring by providing a convenient and contactless approach 17 .For instance, some pioneering studies have been conducted on hospitalized neonates based on deep learning 17,68,127,128 .
Furthermore, the majority of clinical patients require vital-sign monitoring, particularly those who are critically ill, have had surgery or suffered from CVD 129 or hypertension.For patients who require long-term monitoring, traditional contact monitors have obvious clinical disadvantages.If the sensor probe is too tight, it can cause skin damage during extended use.Conversely, if the probe is too loose, it may easily detach due to the patient's movement and necessitate professional reattachment.The primary unmet need being addressed by non-contact monitoring solutions is the mitigation of patient discomfort caused by contact or wearable monitoring technology 130 .For instance, wearable sensors are difficult to use in some patients with cognitive impairment (e.g., Alzheimer's disease) 75 .

(D) Main challenges in clinical study
Compared with the studies based on healthy subjects or laboratory scenarios, the clinical application of VCPM faces a multitude of unique challenges and the number of clinical studies is extremely limited.Therefore, the study of VCPM techniques is highly valuable in addressing digital healthcare challenges in realworld clinical scenarios 131 .Certainly, the following disadvantages of the VCPM technologies must be taken into consideration when applied in clinical settings: (1) Privacy protection; (2) Requiring substantial clinical validation; (3) Not suitable for dark environment unless using an infrared camera; (4) The performance susceptible to disturbance, such as head movement.In addition to the aforementioned issues, the primary obstacles that VCPM faces in clinical study are drawn out in this section.

I. A shortage of public clinical database
The primary challenge lies in the absence of a publicly available database of clinical scenarios.To date, some pioneering studies have been conducted on clinical patients, but none of those data is available due to the patients' privacy protection.Villarroel et al. conducted research on the application of VCPM algorithms in post-operative patients 18 and preterm infants 68 in the intensive care unit (ICU) respectively.However, those corresponding database are not publicly accessible.Moreover, there are only 15 ICU patients and 30 preterm infants recruited in literature 68 and 18 respectively.The limited amount of clinical data from a single center are insufficient to support further research and optimization of AI algorithms, as well as the clinical application of VCPM algorithms.
The scarcity of a clinical public database seriously impedes the algorithmic and application innovation in the VCPM research community.First, due to the unavailability of clinical data, the barriers to the clinical research of VCPM are increased.Thus, a significant proportion of scholars fail to carry out research smoothly.Next, there is no unified benchmark for comparing algorithms developed by various researchers.Last, it would hinder the healthy and sustainable development of the research community.Overall, it is imperative and opportune to establish a public database on its clinical scenarios.
The main deterrent to releasing clinical data results from safeguarding patients' privacy 132,133 .Generally, the data recorder for VCPM includes the facial video of patients and multiple physiological information.Hence, providing access to the data for researchers in need while ensuring privacy presents a tricky issue.To this end, it is necessary to establish new standards for privacy and disclosure of clinical databases by collaborating with the government, academia, and medical community.These guidelines will revolutionize the development and application of AI technologies in digital medicine.From the perspective of technology, there are at least two potential solutions to achieve this objective: establish an AI-based privacy protection system or simulated database for clinical studies.
On the one hand, the primary concept behind federated learning systems is to construct machine learning models utilizing database that are distributed across multiple devices, while simultaneously preventing any potential data leakage 134,135 .Recently, federated learning have been widely applied in healthcare and clinical systems 57,58,[136][137][138] , and just one VCPM study leverages federated learning at present 63 .On the other hand, simulation dataset leverages the concept of digital twins, utilizing both the original clinical video and corresponding physiological signal data to construct a simulation database.Thus, some researchers just need to access the simulation dataset to develop their AI algorithms, and then adopt the transfer learning to optimize the model trained on simulated data.

II. Complex clinical scenarios
Due to the extremely weak vital signs hidden in facial videos, they are susceptible to interference from subjects' status and surroundings.For instance, face occlusion and lateral face videos can weaken physiological signals, while head motion and illumination changes will enhance disturbances.Ultimately, these negative factors increase the challenges in developing a robust VCPM algorithm.

Face occlusion and lateral face orientation
Oxygen therapy is commonly applied to ICU patients, but it will obscure parts of the face due to the presence of oxygen tubes and fixed coated fabric.Moreover, the oxygen tubes are situated in various regions of the face.Thus, it is a time-consuming and laborious task to segment them from each frame of the facial videos.In addition, unlike healthy subjects in laboratory settings, clinical patients can not be instructed to face the camera and are typically confined to their sickbeds with a lateral orientation.Moreover, to minimize background interference and maximize the retention of skin that contains physiological signal information, it is necessary to eliminate non-skin region as much as possible before we feed the skin regions into a VCPM algorithm.For videos of the healthy subjects' face, a facial landmark tool is commonly utilized to extract face ROI, but the tool is not usually applicable to subjects with occluded or laterally oriented face 139,140 .

Head motion and illumination changes
Furthermore, the current bottleneck of VCPM solution lies in their algorithmic performance, which fails to meet clinical measurement accuracy requirements when subjects experience head motion or surrounding illumination changes.The essential reason is that the amplitude of weak vital signs concealed in facial videos is significantly lower than the noise caused by head motion and illumination changes.There have been a few pioneering scholars attempting to tackle these hot-potato issues, yet much work remains for VCPM technologies to attain their full potential, particularly in medical field.

III. Confidence evaluation of algorithms
Due to head movements or illumination changes, the performance of VCPM algorithms may become worse.Therefore, it is reasonable to introduce a confidence evaluation to assess results.The real-time presentation of the confidence coefficient indicates the level of confidence in the measured vital signs.The confidence level can be regarded as a metric of evaluating the algorithm's adaptability to clinical scenarios.Furthermore, it will not only facilitate physicians in assessing patients' conditions, but also provide guidance for further algorithmic improvement to researchers.

IV. Pathological feasibility analysis
In clinical practice, there is a high prevalence of hypertension and CVDs among the elderly population, resulting in various abnormal PPG signals.As shown in Fig. 8, the morphological characteristics of abnormal PPG signals are greatly different from that of normal PPG waveforms.As illustrated in Fig. 1, the fundamental principle of VCPM algorithm is based on the current normal PPG signal.Therefore, it is a tough task to develop a clinical VCPM algorithm that can adapt to abnormal PPG signals and further infer HR, RR, BP, and SpO 2 , which is also urgently needed validation in clinical studies.
However, there has been no study investigating the impact of abnormal pathological PPG signals on the performance of VCPM algorithms so far.All studies assume that subjects have normal PPG waveforms.Hence, comprehensive research guarantee that the effectiveness and robustness of VCPM algorithm is applicable to abnormal PPG signals in clinical situations, which is a challenging task and an innovative future direction to develop an effective VCPM pipeline.Furthermore, VCPM-based technologies can be developed for the diagnosis of CVDs.
Figure 8 displays five PPG waveforms, including four commonly seen abnormal PPG signals in clinical settings.Similarly, the abnormal PPG waveforms were detected in our clinical studies uses a finger-clip sensor.The waterhammer PPG is characterized by a sudden increase in the amplitude of the PPG signal, followed by a gradual decrease.The slow-rising PPG is identified by a prolonged rise time, which refers to the duration from the onset of the blood volume change to the peak of the signal.Specifically, the slow-rising PPG has a longer rise time compared with normal signals.
Pulsus bisferiens, meaning "beating twice", is a type of arterial pulse characterized by two distinct systolic peaks resulting from a rapid rise in blood pressure during systole, followed by a brief fall and then a second rise.This phenomenon is most commonly associated with aortic regurgitation, which stems from the incomplete closure of the aortic valve during diastole, leading to retrograde blood flow into the left ventricle, causing an increase in stroke volume consequently.
Pulsus alternans is a condition distinguished by alternating strength of the arterial pulse between beats due to variations in stroke volume.A decrease in stroke volume leads to weaker pulses, while an increase results in stronger ones.This phenomenon is most commonly related to the left ventricular dysfunction like that observed in heart failure.

Search results
We retrieved a total of 381, 1279 and 1243 records from three databases (Pubmed, Web of Science (WOS), and IEEE) respectively (Fig. 9).Initially, we applied time filters due to the commencement Normal Waterhammer Slow-rising

Pulsus alternans
Fig. 8 The distinct types of abnormal PPG waveforms.
of VCPM research in 2007, which resulted in a remaining total of 345, 1026, and 1139 records respectively.Subsequently, specific built-in filter tools of the three databases were employed: (1) Exclude 28 records with not full text filter in Pubmed; (2) After filtering out literature types that include review papers, unspecified material, books, abstracts only and letter, 730 papers remain in the search results of WOS; (3) Utilizing the filter of publication topics (patient monitoring, medical image processing, medical signal processing, cardiology, diseases, health care, biomedical optical imaging, telemedicine, medical signal detection or cardiovascular system) and 993 records remain in the search results of IEEE.Next, after eliminating the duplicates, there were 1943 items remaining.Finally, after screening by title, abstract or full text, studies conducted in laboratory settings or using radar sensors (e.g., MMW radar) were excluded.Only research papers related to clinical settings, digital medicine, telemedicine or healthcare were selected for final analysis, resulting in a total of 43 papers.

Study characteristics
The 43 research papers are listed in reverse chronological order in Table 2.There were 24 papers in which studies were based on neonates or premature infants, the subjects of another 18 papers were adult patients, and the subjects of the last one included newborns and children 141 .Besides, Batbayar et al. developed a rapid preliminary COVID-19 screening system integrated with a stereo depth, an RGB and a thermal camera to measure RR, HR, and body temperature (BT) respectively 142 .The six studies (Villarroel et al., 2020, 2019, 2014; Chaichulee et al. 2019, 2018; Jorge et al., 2022) 18,68,[143][144][145][146] were from the same team at Oxford University.Among these studies, four papers focused on neonates while the remaining two were for adult patients.
In terms of the implemented algorithm, there were 29 research papers (29/43, 67.4%) that conducted classical methods, and only nine studies (10/43, 23.3%) implemented AI-based methods.Besides, the remaining four articles (4/43, 9.3%) did not explicitly state the used methods.Among the ten AI-based papers, five studies were from the team of Oxford University (UK) 18,68,143,145,146 , two from Beihang University (China) 17,147 , and the remaining three from RWTH Aachen University (Germany) 148 , Indian Institute of Technology Madras (India) 127 , Institute of Computer Science FORTH (Greece) 149 respectively.
Additionally, the relationship between the number of subjects and total videos' length are presented in Fig. 10 based on the data resources shown in Table 2. Despite the 23 pairs of data may not be entirely statistically significant, Fig. 10 presents that the data scale (video total length or number of subject) of AI-based methods are commonly larger than non-AI approaches when excluding studies inside the blue ellipse from the same team.Generally, the performance of AI-based approaches is dependent on the scale of database, while classical methods only require fewer data samples.In fact, no matter AI-based approaches or classical ones, a large-scale and diverse database is indispensable    to assess the comprehensive performance of VCPM approach toward complex real-world clinical settings.In addition, although contactless studies in clinical scenarios have been emerging, some critical physiological indicators such as SpO 2 and BP have not been researched so far.The research contrast of physiological parameters between laboratories and clinical scenarios is illustrated in Table 3. Depending on the selected clinical studies and the SOTA research trends of VCPM, our findings are summarized below: • All clinical studies are based on their own database.The vast majority of these studies focus on the assessment of clinical applications, rather than dealing with real-world clinical scenarios, researching novel paradigm or analyzing clinicopathology.

•
As shown in Tables 2 and 3, all clinical studies concentrate on the measurement of HR or RR.Even though AI-based research developed for SpO 2 or BP measurement on healthy subjects has been growing explosively from 2021 90,96 , none of them are intended for monitoring SpO 2 or BP in clinical settings.

•
The VCPM algorithms, developed for clinical settings, are increasingly favored by researchers at present.In particular, the research on VCPM algorithms showed an exponential growth in 2022.
• There is a great gap between healthy/laboratory scenarios and clinical settings on the research of AI-based methods.A great many SOTA AI-based methods have been developed on healthy/laboratory settings 20 , but none of them has been applied in clinical settings.

Phenomenon: the gap between the laboratory and clinical settings
(1) The studies of AI-based VCPM algorithm have been growing exponentially on healthy subjects/laboratory settings from 2021 to 2023.As shown in  3) From the perspective of the novelty of approaches based on AI technologies, unsupervised learning 47,48,51 , Transformer 49,[150][151][152] , GANs [115][116][117] , meta-learning 153 , and Graph Neural Networks 154,155 have been developed for nonclinical scenarios.However, these technologies are rarely utilized for clinical settings.(4) As illustrated in Table 3, although the AI-based study about contactless SpO 2 and BP estimation has become a hot topic in recent two years, all the studies concentrated on healthy people rather than the patients in the hospital.
Reason: thinking and inference

Fig. 1
Fig. 1 An overview of physiological principle of VCPM technologies of multiple physiological parameters monitoring.a A schematic representation of the cardiopulmonary circulation system.Due to the interaction of oxygen between the heart and lungs, respiratory rate information is implicitly reflected in hemodynamics.b The skin reflection model of the blood volume pulse (BVP) signal monitoring and the hemodynamics varying with the heartbeat.c Different body sites employed to extract PPG signals.d PPG signals from various body sites with RGB channels.e The vital signs derived from PPG waveforms.f The AI model for cardiopulmonary status assessment, and disease diagnosis.Subgraphs (a-c and e) are designed by Freepik.

Fig. 3
Fig. 3 The skin segmentation results of SAM online demo on clinical scenarios.a The facial region of our ICU patients' recording image.b Full image automatic segmentation.c The results of automatic segmentation solution.d The interactive manual segmentation process.The rectangle box denotes the selected region, and dots represent the areas to be removed or retained.e The results of interactive segmentation.The source images are designed by Freepik.

Fig. 4
Fig. 4 The VCPM-based telemedicine/telehealth system is employed to personalized disease diagnosis and public health management.a The two typical application scenarios.The VCPM relies solely on ubiquitous cameras to capture video data.b The internet infrastructure, including both wireless and wired networks, facilitates the transmission and storage of data across the globe.c The AI model of physiological monitoring based on individual video data, and the AI model for large-scale decision-making, incorporating multi-source information fusion based on global patient data.d The upper subgraph denotes the personal health care in a telemedicine system, while the lower subgraph depicts the decision-making of public health policies based on global patient information and horizontal relationships.The elements of subfigures are designed by Freepik.

Fig. 5 Fig. 6
Fig. 5 The development trend of the vital-sign monitoring of neonates or preterm infants.a The conventional contact monitoring approach with hard-wired devices and rigid sensors that adhere to neonatal skin.b The wireless, non-invasive soft biosensors employed to monitor physiological signals in NICU or pediatric ICU (PICU) settings, e.g., the research of literature 120 .c The video-based non-contact vitalsign monitoring solution utilized in the NICU, such as the study of Oxford University 68 .

Fig. 7
Fig. 7 Application scenarios of VCPM.Sub-figure (a) is designed by our team, and (b-i) are designed by Freepik.

Fig. 9
Fig. 9 Flowchart for literature search and screening.

Table 1 .
Recommended settings for the clinical application of VCPM technologies.lowerthevideocompression rate, the easier it is used to observe BVP signals hidden in skin videos.Measurement distanceAccording to the focal length of the camera, ensure the image of the skin region is clear, and the skin area is greater than 128 × 128 px (The input image size of AI-based methods is usually set as 128 × 128 px).Measurement of BP and SpO 2 Comprehensive and systematic verification is required to verify the feasibility in clinical practice.The tested BP and SpO 2 database needs cover the range of 40-240 mmHg and 70-100% respectively.
110,1155][116][117] oxygen saturation is a challenging task, which currently precludes the incorporation of oxygen and blood pressure information into simulation data.Domain adaptive.Due to the bias (e.g., illumination, the bias of distinct clinical centers) between the training source and testing target domain, the generalization ability of deep learning-based methods should be introduced.To improve the generalization ability of rPPG models, Du et al. proposed a domain adaptive method that aligns intermediate domains and synthesizes target noise in the source domain to achieve superior noise reduction by reducing domain discrepancy110.We deem that the adaptive domain approach can be extended to effectively mitigate the disparity between laboratory scenario data and clinical data.GAN-based VCPM technologies.Generative adversarial network (GAN) is an unsupervised learning framework for estimating generative models via adversarial training114.GANs are widely utilized in the fields of data generation, data augmentation, style transfer, etc.Recently, GAN has been introduced to improve the performance and generalization of VCPM technologies110,[115][116][117].Particularly, GAN is used to generate adversarial noise to improve the generalization ability of PPG signals' prediction models110,115.Although some achievements have been made in studies on healthy subjects and laboratory settings, it also has significant value in clinical scenarios.Owing to the complex clinical scenarios and its distinction in different clinical centres, the generalization performance of AI-based VCPM algorithms might be degraded when applied to other clinical scenarios.Therefore, GAN-based VCPM technology is a potential approach to alleviating the generalization difficulty in multiple centres.DISCUSSIONIn the section, we will discuss the topics on (A) digital medicine revolution; (B) the merits of VCPM technologies; (C) The necessity of clinical settings; and (D) Main challenges in clinical study.•Telehealth.Telehealth encompasses remote clinical healthcare, patient professional health education, as well as public health and healthcare administration.Usually, telehealth covers a significant proportion of digital health solutions.•Telemedicine.Currently, there is no universally accepted definition of telemedicine.Generally, it is the utilization of telecommunications to remotely provide healthcare services, encompassing a wide range of applications such as video consultations, diagnosis and patient monitoring.It can be

Table 2 .
A summary table of clinical studies based on VCPM algorithm (The deadline of search isMay 22, 2023).

Table 3
, the SOTA algorithms have demonstrated outstanding performance, but have rarely been generalized to clinical application.(2) The studies of VCPM algorithms have been soaring on patients/clinical settings from 2021 to 2023, yet only a limited number of studies have incorporated AI-based algorithms.In Table 1, only 6 papers (6/21, 28.6%) utilized AI technologies (2021-2023).(